Publication Updated on May 19 2020
In the context of the 2020 Coronavirus pandemic, data presentation has an underspoken role in guiding both public perception and policy. Concerned Americans are filled with comparative questions which are incredibly difficult to answer with tabular data. Are less people dying? Is my state faring worse than others? Is the recovery similar across all states? The visualizations that we use to translate the data are critical towards understanding and making these comparisons. To begin with, the figure below is a reproduction of the chart displayed at many recent White House press briefings. This chart places the daily cumulative total of deaths for all 50 states on a shared set of axis. This is the bare minimum of data visualization, and its design actually obfuscates important information.
The scale is set by the Cumulative Deaths in New York state, which is far above the other states. This compresses the rest of the data points to an unreadable spread.
The slope of the Cumulative Deaths is the number of new deaths per day - a key metric in tracking coronavirus recovery. However, the graphic leaves the determination of that slope to the viewer’s eyeball.
The only effective baselines for comparison are 0 and the Cumulative Deaths for New York. Without any kind of national total or average, the viewer is left to judge their state’s impact only in reference to New York, an arbitrary baseline.
The graphic shown by the White House and similar ones by other media groups lack interactive filtering and tooltips. The graphic is limited to interpreting the data as it is statically presented - lending itself only to finger pointing for highlighting data points.
Don’t get me started on the legend.
The issues above can be addressed with two main changes to the graphic:
Introducing interactability to the graphic so that it can be filtered and hoverable by the viewer. Ideally, the viewer should be able to highlight groups within the graphic for comparison.
Produce, without clutter, metrics other than Cumulative Deaths, so that the nuances of the changes in coronavirus fatality can be fully understood.
By utilizing the Plotly front-end user interface, this can be accomplished relatively simply. Plotly enables for interactive plots from R, with excellent features for filtering, panning, and making comparisons on data in shared x-values. This address Change 1. By placing Cumulative Deaths, New Deaths per Day, and New Deaths per Day per 100,000 on a shared axis of Date, we accomplish Change 2.
Features of the Plot:
Hover over any datapoint to see detailed information as well as supplementary info. For example, hovering over a point in the Daily Deaths graphic displays the number of Deaths due to coronavirus and the total number of deaths expected using the 2019 average.
Double-click a state in the legend to isolate that state. For example, you will notice that a state called -Nationwide- is now included in the data. Doubleclick this item to show only the data for Nationwide deaths.
Single-click a state to add or a remove a state from the plot. For example, after you have double-clicked Nationwide, single-click New York to compare the two areas. Then, single-click California to include it as well.
In the far upper right of the graphic, click the two-tag symbol next to the blue square. It should say Compare data on hover. Hover details for all data points will now be shown for any date that is hovered over. Try this with the Nationwide - New York - California selection.
The above rework is largely sufficient for answering the questions most common in coronavirus discourse. From the three area comparison above, we can answer our original questions. The Daily Deaths chart shows us the number of national daily deaths are trending downwards. The Daily Deaths per 100k People shows us that the fatalatity of New York is above the national average when adjusted for population, and that California is below the national average. Double click any state to removing all filters, and then single-click Nationwide to remove the national total. The Daily Deaths per 100k People chart in the most recent dates shows us that three states are actually getting worse in proportion to their population: New Jersey, Connecticut, and Massachusetts. Filter to these states to explore more.
I was also interested in taking complete creative license with the data in order to explore novel visualizations. Rather than constraining the x-axis to time, which marches along in an entirely predicatable linear fashion, why not provide the x-axis with a more interesting variable. I explored the idea of a bar chart, representing Cumulative Deaths for a region, that was filled as time went by. The idea floundered in rendering. Instead, a dot plot, where each dot is the cumulative total for that day. That way, the distance between dots would indicate the speed by which the cumulative total is increasing. The result:
1. Overall Layer
The default view acts as a horizontal bar chart. States are plotted by population, but as population remains constant, each state occupies only one row of the chart. The horizontal length of each state represents the Cumulative Deaths at the most recent date. Of course, the Nationwide Cumulative Deaths and Population exceed any state. As noted previously, each dot represents a new day, so large horizontal distance between dates represent a large number of deaths for that day. We can filter out national totals to see the states more clearly by single-clicking on Nationwide.
2. All States Layer
Filtered down to states, we can see how the states are distributed by population. This enables appropriate comparisons and identification of trends. For example, New York has the largest number of Cumulative Deaths, but is the 4th largest in terms of population, making it a true outlier in the impact of Coronavirus. This inbalance is also true for New Jersey. Let’s filter to two states for comparison by double-clicking on Michigan and then single-clicking on New Jersey. With similar populations, it is more reasonable to compare these states.
Let’s activate the Compare data on hover tool in the upper right of the plot. By placing the cursor in the space between the two lines of dots, we can see at what date the two states had the same number of cumulative dates. For exampl,e by hovering over the 4,000 death mark, we can see that on 5/2/2020, Michigan reached 4,020 cumulative deaths, and that New Jersey had reached 4,080 deaths back on 4/18/20. It would be interesting for policy makers in Michigan to therefor look at the policy decisions made in New Jersey back on 4/18 and see if there was anything to be learned.
3. Single State Layer
Now, let’s isolate a single state - New York. Double-click to isolate the state. We now see a very different picture. The dots still align with the cumulative total for a given date, but we have two new pieces of information plotted on the chart.
A shaded line, which represents the number of deaths on that day.
A solid line, which represents the average change in number of deaths per day over the 2 weeks prior to that date. A slippery concept to be sure, but bear with me. This figure is calculated by taking the moving average of daily increase in deaths over the past two weeks. However, we are less concerned with the actual figure, and more with its scale and direction.
Taken together, this view gives us a sense of the scale of deaths (Cumulative Deaths), the rate of increase in those Cumulative Deaths (the shaded line representing deaths OR the size of the gap between dots), and the change in the deaths per day over time (the solid line OR the vertical change in the deaths per day lines)
For New york, we can see that the deaths per day were increasing until about 4/17/2020, quickly at first and then more slowly. On 4/17, the deaths per day began a steady decrease, reflected by both the downwards solid lines, and the smaller shaded lines. A particularly deadly day on 5/6 is the exception of a much-improving situation in New York over the past 2 weeks, with an average decrease of 11 new deaths per day over the 2 weeks ending on 5/14.
Hopefully, this paper has given the reader some new insights in what they can and should expect from modern Data Visualizations. As Edward Tufte puts it, the guiding principle for design is thus:
Graphical Excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.
I have tried my best to reimagine Coronavirus visualizations in the context of this advice, and would love to hear what you like, and especially what you don’t like, about my efforts. Please contact me at francisabritschgi@gmail.com to share whatever feedback or questions you have.